Caio Raphael

CPU registers are small, ultra-fast storage locations inside the processor. They sit at the very top of the memory hierarchy—faster than cache and RAM—and are directly accessed by the CPU’s execution units.
- Accessing RAM is ~100x slower than register access
- Good compilers and low-level code try to:
  - Minimize memory access
  - Maximize register reuse
At the hardware level, registers are implemented as flip-flop arrays (or similar circuits) inside the CPU. They are grouped into a register file, which is a small block of storage with multiple read/write ports.
They are directly wired into execution units (ALU, FPU, etc.)
Multi-ported: allows reading multiple registers at once (e.g., 2 reads + 1 write per cycle)
Registers act as:
- Operands for arithmetic/logic operations
- Temporary storage during instruction execution
- Control state (e.g., instruction pointer, flags)
Creel - 16 bit registers
Creel - 32 and 64 bit registers
- "As soon as you do an operation with 32bit registers, the top is wiped; I don't know why".

Register Naming / Register Renaming

The registers you see in assembly are not the real physical storage inside modern CPUs.
Modern CPUs often use register renaming internally.
In the past, a register was a real register, it meant a real place in the chip. This is not what happens anymore. Now the register name goes into something called RAT, which will translate the names you gave ( RAX , for example) to names that corresponds to "slots" used by a micro op instruction.
In hardware :
- CPUs often have more physical registers than exposed, thanks to renaming
In assembly :
- Registers look fixed and scarce.
- They are named variables.
- Register names are not universal. They depend on the CPU architecture (ISA), not on the assembly language syntax itself.
- Assembly language is just a human-readable representation of that ISA, so:
  - Different CPUs → different register sets and names.
  - Same CPU → mostly same registers, even across different assemblers.

Why

Reason : This avoids hazards and enables out-of-order execution.
Consider this assembly:

mov rax, 5
add rax, 3
mov rax, 10
add rax, 2

What looks like dependencies:
- Instruction 2 depends on 1 → OK (real dependency)
- Instruction 4 depends on 3 → OK (real dependency)
But here’s the issue:
- Instruction 3 reuses rax
- This creates a false dependency between unrelated computations
The CPU might think "I must wait before reusing rax”, even though the two computations are independent.
The CPU fixes this by secretly doing something like:
| Assembly register     | Physical register |
| --------------------- | ----------------- |
| rax (first use)       | P1                |
| rax (after overwrite) | P7                |

So internally, it transforms:

mov rax, 5      ; P1 = 5
add rax, 3      ; P1 = P1 + 3

mov rax, 10     ; P7 = 10   (NEW register!)
add rax, 2      ; P7 = P7 + 2

Now the two sequences are completely independent, and the CPU can execute them in parallel or out of order.
True dependency (RAW — Read After Write)
```
mov rax, 5
add rbx, rax   ; must wait
```
- Cannot be removed
- Real data flow
False dependencies (renaming fixes these)
- WAR (Write After Read)
```
mov rbx, rax
mov rax, 5
```
- WAW (Write After Write)
```
mov rax, 5
mov rax, 10
```
- These are fake constraints caused by limited register names, not real data.

RAT (register alias/allocation table)

There are 16 registers in assembly (or 32 registers if you count vector registers), but there's around ~192 entries in a modern RAT, or maybe more in newer chips.
The reason for that, is so it can extract the parallelism of things that are independent from each other, to optimize the pipeline. This tries to fill in a window of things we could be doing.
The RAT job is to expand the instruction stream into independent dependency chains that use those slots, to decompose that 16 register stream into something much more verbose that involves much more registers than that.
It keeps track of:
- rax → P1
Then when a new write happens:
- rax → P7 (mapping changes)
Old physical registers (like P1) are still kept until safe to discard
This is coordinated with the reorder buffer (ROB)

Types of hardware registers

General-purpose registers (GPRs)
- Hold arbitrary values
Special-purpose registers
- Program Counter (PC / RIP)
- Stack Pointer (SP / RSP)
- Status/Flags register
Vector/SIMD registers.
- For parallel data (e.g., AVX)

Sub-Registers

“Sub-registers” are simply smaller portions of a larger CPU register.
On x86-64, a 64-bit general-purpose register can be accessed at smaller widths without affecting the rest of the register.

| Register | Size   | Notes                                                         |
| -------- | ------ | ------------------------------------------------------------- |
| RAX     | 64-bit | Full 64-bit register                                          |
| EAX     | 32-bit | Lower 32 bits of RAX ; upper 32 bits are zeroed when written |
| AX      | 16-bit | Lower 16 bits of RAX                                         |
| AH      | 8-bit  | High 8 bits of AX (bits 8–15 of RAX )                      |
| AL      | 8-bit  | Low 8 bits of AX (bits 0–7 of RAX )                        |

So when you do:

mov al, 0x12   ; sets the lowest 8 bits of RAX

Only the lowest 8 bits are changed.
RAX still contains the original values in bits 8–63 (except in some cases of zero-extension).
Why this exists
- Legacy compatibility: x86 has been 16-bit → 32-bit → 64-bit.
- Flexibility: allows operations on smaller data types without touching the whole register.
- Efficiency: some operations only need 8 or 16 bits; no need to touch all 64 bits.